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AN EXPERIMENTAL ANALYSIS OF ANS'O-CHAHGIfIG 



BEHAVIOR OH OBJECTIVE TESTS 



Stanley S. Jacobs 
University of Pittsburgh 



The study was an experimental investigation of the effects of 
Item difficulty and subject ability on $s answer-changing be- 
haviors. Ss were administered an achievement test composed of 
items at three levels of difficulty via slides, followed by a 
printed copy of the test. Analyses revealed no effects attri- 
butable to subject ability. Item difficulty was related to 
both frequency and quality of change. Fewest answers were 
changed for easiest items, with the greatest number of chanqcs 
and points gained on moderately difficult items. A generally 
inverse relationship aoreared between quality of change and 
difficulty. Ss wore unable to predict the outcome of their 
answer-changing; while approximately 50^ felt they tyolcally 
lost, on the average all Ss nained regardless of their 
opinion. 



There seems to be some fe n ling on the cart of students that ini- 
tial decisions concerning objective test items are usually correct, although 
apparently the only published data on this point are those of Mathews (1929). 
Mathews' data, however, fail to support students' opinions, l.o., it is ap- 
parently advisable to change one’s responses, since the typical result is an 

improvement in test score. ■ 

1 

Writers o n the topic of t^st-takinq behaviors are not always in 
agreement concerning the advisability of answer-changino. Huff (1961), in 
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a widely-read quide for test-takers, Implies that it is usually inadvisable 
to change answers. Mi liman. Bishop and Ebel (1965), however, suggest that 
the tendency to evaluate and judiciously change one's item responses is a 
basic aspect of test-wiseness. 

A number of studies (Lehman, 1°28; Mathews, 1929; Jarrett, 1948; P.eile 
and Briggs, 1952; Bath, 1967) have concluded that there Is a relationship be- 
tween total test scores and the quality of changes made. That is, better 
students gain more than poorer students when answers are chanqed. However, 
one must be aware of a possible tautology since the stratifying variable 
(total test score) is simply the summation of item scores which are effected 
by the changes made. 

These studies, as well as those of Berrein (1939) and Lowe and Craw- 
ford (1939) have demonstrated that the general result of answer-changing be- 
havior is a higher test score. 

The present study was designed to Investigate the inter-relationship 
of item difficulty, ability level of Ss and answer-changlnq behavior on an 
objective achievement test, with some deqree of control maintained over the 
decision-making process. 

Methodology 

Subjects 

The sample of 50 Ss Involved In the present study were drawn from the 
enrollment of the Introductory graduate course in educational research at the 
University of Pittsburgh. Participation in research was a part of course re- 
quirements. 

Procedure 

In the first v. r eek of the term, all Ss completed the Quick ''ord Test 
(Q!1T) (Borgatta and Corsinf, 1964), a 100 item, 4-option multiple-choice 
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vocabulary test. Following the first course examination, Ss were requested 
to compose a brief note detail Ina their opinion concernina answer-changing 
on objective tests, endinq with a statement Indicatinq whether £ felt the 
net result was a gain or loss In test score, or whether the result was un- 
known to S^. 

Approximately four weeks later, Ss completed an examination dealing 
with measurement concepts composed of 45 4-optlon multiple-choice items. The 
items were drawn from a larger pool of items for which item analysis data 
were available so that the test contained 15 easy Items (p r .75), 15 Items 
of moderate difficulty (p » .49) and 15 very difficult items (p = .29). All 
items were positively discriminating and an attempt was made to maintain con- 
tent validity. Items were randomly ordered viithin the test. 

Items were reproduced sinqly on 2 x 2 slides. Items with a total woru 
count of 25 or greater were produced as biack on white slides, and given an 
exposure time of 45 seconds. Items with a word count of less than 20 were 
produced as white on black slides, and exposed for 30 seconds. Ss were In- 
formed of the mode of testing and the exposure tines and cues. They were 
Informed they would see the slides only once, to read the items rapidly but 
carefully, and to answer all Items. 

Slides were presented using a remotely-controlled Kodak Carousel Model 
850 projector with an Endalight screen. Timing was done uslno a Sears Model 
19902 stop-watch. 

Upon completion of the 45 Item test, Ss were Informed they would have 
^ the opportunity to reconsider their answers. Black clectrooraphic pencils 
I\1 ;ed to complete the test were collected, and Ss received a mimeographed 
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copv of the test and a red pencil. Any changed answers were to be recorded 
in red, without erasing initial responses, allowing the deterni nation of the 
frequency and quality of chanqed answers. 

Analysis 

A 2 x| x 3 three dimensional chi-square was developed with the fol- 
lowirg dimensions. 

(1) ability: S^s were divided at the median of the QWT scores Into low and 

high ability qroups. 

(2) type of change: Vlrong-to-riqht, right-io-wronq, and wronq-to-wronq 

categories were established for chanqed resoonses. 

(3) item difficulty: Items were categorized as beina of low, moderate or 

high difficulty, based upon analysis information provided by 
a similar group two terms earlier. 

A two-v/ay AMOVA for repeated measures was employed to analyze net 
gains realized through answer-chanolng, as a function of subject ability and 
level of item difficulty. Extreme qroups of n=15 were formed for the ability 
variable. 

A one-way Af'OVA was employed to analyze net oalns made by Ss previously 
reporting qain, loss, or no decision concerning their answer-changing be- 
havior. Oue to absences v'hc-n the Initial reports were collected, the n for 
this analysis is 44 rather than 50. 

Results 

Of the five chi-squares calculated, only two were significant at tho 
.05 level: the \ c between dimensions (2) and (3), and the total x • Since 

q "he other dimensions appeared Independent and the interaction x was non-sig- 
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niticant, the total x^'s significance may be attributed to the dependence 
between dimensions (2) and (3). (See Table 1) 



TABLE 1 

Frequency of Types of Answer Changes 
Hade to Items of Low, Moderate and High Difficulty 



Type of Change 


Level 


of Item Difficulty 


Low 


Moderate 


High 


Right- to- wrong 


41 


50 


57 


VJrong-to-right 


134 


184 


93 


t!rong-to-wronq 


20 


58 


98 



X 2 « 68.2, p < .05 

As may be seen In Table 1, there is a marked tendency to change In- 
correct responses to ’orrect responses, with the quality of chanqes showing 
a gradual deterioration as item difficulty increases. As one might expect, 
fewest answers are changed for the easiest Items, and the amount gained is 
least for the difficult Items. (See Table 2) 

TABLE 2 

Summary of Met Gains Resulting from Chanqes, for 
Three Levels of Item Difficulty and Two Levels of Subject Ability 
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Level of Item Difficulty 


Level of Ability 


Low 


Moderate 


High 




X 


s.d. 


X 


s.d. 


x s.d. 


High 


1.93 


2.12 


2.47 


1.81 


.33 1.91 


Low 

r>” 

ys 


1.47 


1.73 


3.47 


2.72 


1.20 2.43 
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As summarized in Table 3, there is no significant difference attri- 
butable to subject ability in net gains when answers are changed, but there 
are significant differences attributable to the level of item difficulty. 
The greatest gains are realized when Ss change the answers to items of 
moderate difficulty, the least when answers to very difficult Items are 
changed. 

TABLE 3 

Repeated Measures Af 'OVA Summary Table for Fffects of 
Subject Ability and Level of Item Difficulty on Let Gain Scores^ 
(Conservative Test (Miner, 1962)) 



Source 


df 


MS 


F 


Ability (A) 


1 


4.90 


.89 


Error. 

A 


28 


5.48 




Level of Diff. (B) 


2 


35.58 


*■ J 
CD 

CO 


A x B 


2 


-1.93 


1.19 


Error ^ 


56 


4.15 





Students are apparently unatle to predict the outcome of their answer 
changing behavior accurately. As seen in Tables 4 and 5, all qroups gain as 
a result of answer-chanqinq, and the differences amono croups are non-siani- 
ficant. 
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test, for repeated measures, showed the locus of tjio signl 
to be between moderately difficult and hfahly difficult 
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TABLE 4 

Summary of n et Gains Over Total Test for 
Those £s Reporting a Typical Gain, Loss, or Mo Opinion 



Gain 


Loss 


Do Mot Know 


n 


X 


s.d, 


n 


X 


s.d. 


n 


X 


s.d. 


13 


6.0 


2.9 


20 


4.8 
1 


3.1 


n 


5.5 


4.3 



TABLE 5 

One-way AMOVA Testim Differences in Actual 
Gains Among Groups Reporting Gain, Loss, or No Opinion 



Source 


df 


ns 


F 


Between 
Hi thin 


2 

41 


6.34 

11.27 


0.56 



Discussion of Results 

Although the general Izabillty of the present study may be somewhat 
limited due to the unique testing procedure employed, it was deemed of 
greater Importance to first Insure some degree of internal validity for the 
study. The prevlots work cited depended upon post-hoc examinations of test 
papers to determine frequency and quality of answer chanoe. Aside from the 
questionable reliability of the procedure, it seems to assume tiiat if any 
answers are changed, the student must out nen to paper. In view of the 
question of "overt" versus "covert' chances of mind, it was decided to 
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(i) imply to students that they vtould see items only once, (2) pace them 
carefully throuqh those items and (3) force a response to the item, follcwcc 
by an opportunity to reconsider initial answers in a manner of readily de- 
tectable by the experimenter. 

i 

The present study indicates that, student oninion notwithstanding, 
students should be allowed and encouraged to reconsider answers to multiple- 
choice items. The improvement in scores may be qreatest on somewhat s needed 
tests composed of moderately difficult Items. If one were Interested in 
the best approximation of a "true score," it would seem advisable to reduce 
the deqrec of speedodness as much as possible. It also appears that verbal 
ability of the type measured in the present study is unrelated to gains made 
in answer chanqinq. The question of achievement of Ss and oafr.s was not 
Investigated. However, one must be aware of a possible "ccillnq effect," 
i.e., better students nay make far fewer changes, thereby gaining less. 
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